NBLAST score matrix builder v2 #69

clbarnes · 2021-11-25T18:53:26Z

Rising from the grave...

Differences from #28 :

Actually handles half-open intervals properly rather than just ignoring the brackets
The lookup tables take Digitizer objects rather than boundaries. This better encapsulated the logic for parsing and stringifying the digitizers, and made the handling of left/right half-open boundaries and value clamping more flexible. It's a bit more verbose, but most people will be working with existing CSVs anyway so it'll all be hidden.
- Digitizers can be created from linear sequences (useful for dot products), geometric sequences (useful for distances), and from quantiles of existing data
- Digitizers force the interval to cover -inf to inf. By default they replace the first and last bounds with -inf and inf, but still store the original values so that they can be used like ScoringFunction.max_dist; alternatively they pre/append -inf and inf.
I haven't actually incorporated it into the {N,Syn}Blaster yet

And a big training dataset (L/R pairs in the L1 brain) is within reach thanks to @swilson27.

Open questions: is ScoringFunction sufficiently internal that I can just replace it with the new Lookup2d class, or should it be kept around for compatibility?

schlegelp · 2021-11-26T10:42:12Z

I think that ScoreFunction could reasonably be replaced.

clbarnes · 2021-11-26T12:00:52Z

Order of operations is another possible question. Currently, while building the lookup table, each query-target pair is assessed and its array of output scores digitized into the array of counts. This has a low memory overhead because we don't keep the raw scores around for very long; just compress it into the table as soon as possible.

However, it might be convenient to not define the digitizers until after we have all the raw scores - the boundaries could be learned from that data (alluded to in Digitizer.from_data()). Keeping all the raw scores around for digitizing later would obviously take much more memory, but realistically this probably isn't much of a concern, and might take some of the art out of defining ranges and boundaries.

schlegelp · 2021-11-26T12:49:22Z

There might also be a bit of mileage if scoring was serialised. Currently that part of the process hardly registers but it might with more complex scoring functions and/or bigger matrices.

schlegelp · 2021-11-26T12:55:17Z

I was wondering if we should provide a system similar to the bridging transforms that users/downstream packages can use to register scoring functions. Users could then invoke NBLAST with e.g. nblast(..., score_func='L1').

clbarnes · 2021-11-26T13:01:48Z

I don't think there's much of a need to do that - the mapping from score function reference to the score function would just be a dict. If they've got a handle to the score function already, they can just use that, with no need to bring our opinions on an additional naming scheme in, or they can just maintain the dict themselves if they're working with many scoring functions at the same time.

clbarnes · 2021-11-26T18:26:02Z

I've just given this a go with the lateral pairs in seymour's brain. It doesn't take nearly as long as I thought it would to build the score matrix, and said matrix looks pretty sane, even with all the bins determined automatically. That is, if the points are close enough they don't really care about orientation, but if they're further away then the orientation starts to matter more. The distance boundaries increase in a somewhat geometric way: interestingly, the dot product boundaries are a sort of reverse geometric (the bins are large for low values and small for high values, i.e. we don't care whether they're quite different or very different orientations, but we do care whether they're similar or very similar).

clbarnes · 2021-11-30T15:07:28Z

I think this should be functional and documented, including an ipynb tutorial, but I'm not entirely sure whether I've got the indexing right for that - where could I find the PR build of the docs?

schlegelp · 2021-12-03T14:36:52Z

Quick question: I'd be keen to produce a new score matrix for synblast. I guess a 1-d (distance only) should straight forward but I'm trying to think how we could produce a scoring function with separate distances for pre- and postsynapses? For example, we might expect that post-synapses need to be closer than presynapses.

clbarnes · 2021-12-03T16:12:40Z

Yes, 1D is simple - just a LookupNd(Builder) with digitizers=[single_digitizer].

The clearest way to do it would probably be for pre and post to be two scoring functions. Possibly the SynBlaster object could take smat: Union[LookupNd, Tuple(LookupNd, LookupNd)]. This would be the only way to have different distance scales for the two.

However, if your distance scale was appropriate for both, and you just wanted them to drop into different slots, you could hack a Lookup2D for this purpose as well: if your point match function for two connector tables queried pre to pre and post to post then returned a tuple of (distance, synapse_type) where synapse_type is e.g. 1.0 for pre and -1.0 for post; your lookup table would then have 2 columns and a distance bin on each row.

Alternatively, you could query all synapses to all synapses, then your second axis could be "is the nearest synapse of the same type"; that column would basically just set to 0 if they're not. Or you could split it into 4 columns, in case there are different values for matching Q pre to T pre, Q pre to T post, Q post to T post, Q post to T pre.

clbarnes · 2021-12-07T11:16:32Z

The Digitizer now inherits from a base class (LookupAxis), and has a sibling SimpleLookup which can take non-float things. So your match function could return a float array of distances and a categorical array of synapse types, and the categories could be converted into indices with a SimpleLookup. For now the Lookup2d convenience class still only handles the pair of (float) Digitizers, because its {to,from}_strings() isn't necessarily well defined for the arbitrary types that SimpleLookup can work with.

I think this is ready to go in, but as it's a pretty significant change I'll hold off until you get a chance to take a look!

schlegelp · 2021-12-07T18:55:56Z

Thanks! Will attempt to have a look Wed or Thu!

schlegelp · 2021-12-08T11:21:59Z

Looks good to me. One thing that came to my mind though is this: as is, NBLAST will complain if data is not in microns. That made sense as long as there was only the FCWB score matrix but might become a nuisance/confusing if people use their own. I can see two options to deal with this:

Remove the microns-check for anything but the FCWB score matrix (assuming that people will know appropriate units with custom matrices).
Make data in microns a soft convention by (a) mentioning it in the tutorial and docstrings and (b) have LookupNdBuilder also complain if neurons aren't in microns.

I'm slightly in favour of option 1.

Otherwise happy for you to proceed and merge.

clbarnes · 2021-12-08T11:27:29Z

Digitizers could optionally have a unit defined on them on instantiation; the Nblaster could then check the Lookup's digitizers for such constraints.

This would work fine for current usage, I think. At the end of the day the match function, lookup table, and lookup axes are all pretty tightly bound, but the blasting algorithm would have to change a lot for this level of generic-ness to be a problem, I think.

clbarnes · 2021-12-08T17:10:11Z

Hm, this turned into a whole thing 😆 The sensible place to do unit checking is when dotprops are appended to the Nblaster, but partitioned nblasts where the same dotprops may be added to multiple nblasters could make this very noisy. I see check_microns jumps through some hoops to avoid parsing unit/quantity strings to often, too, which wouldn't be possible here.

Currently I'm going with adding a ignore_units kwarg to NBlaster.append.

To make things a little more complicated, and as I think you've found already, pint has Quantity("1 micron") == Quantity("1 micrometer"), but doesn't set them as an alias: i.e. Unit("micron") != Unit("micrometer"). You can manually set them as an alias in the registry (ureg.define("@alias micron = micrometer")), which fixes the __eq__ check but not the __hash__ - i.e. they come up as different in sets (which, say Unit("micrometer") and Unit("um") do not).

Non-isometric units are also a bit tricky, it would be nice to just unitise that all away before it gets anywhere close to NBLAST, but again, doing that multiple times would slow things down.

clbarnes · 2021-12-08T17:29:14Z

Maybe there could be force_units(neurons: Sequence[BaseNeuron], units: pint.Unit) -> NeuronList function, which

adds the unit to dimensionless neurons (possibly with a warning)
rescales neurons to isometrically use that unit
Possibly try to figure out a consensus unit if the given units are dimensionless, or at least throw an error if there is unit heterogeneity among the neurons

This function could basically be used wherever check_microns is used now. If it were implemented on NeuronList then it could make use of existing infrastructure for parallelisation and so on.

schlegelp · 2021-12-08T18:36:41Z

Oh haha, I didn't mean for you to go down that rabbit hole.

Re a force_units function: Not a fan of implicit conversion of units if I'm being honest - a warning for dummies like me who sometimes forget to convert should be sufficient.

Perhaps the NBlaster or LookupND could have a validate(self, neurons) or preflight(self, neurons) method which is called by e.g. nblast() before appending the neurons?

Re check_microns: yes, I converged on the current implementation after seeing that using the pint objects causes (a) issues and (b) slows things down a lot for large neuron lists. If you end up going with pint.Units I highly recommend you test with a couple thousand neurons at a time to make sure this won't become a huge bottleneck. FWIW: check_microns could also be generalised to something like check_units(NeuronList, units=('um', 'microns', 'micrometer')).

Re microns vs micrometer: in theory microns should never show up as units . See this line in the setter method:

navis/navis/core/base.py

Line 486 in 3e59e41

v = v.replace('microns', 'um').replace('micron', 'um')

Sorry for the messy collection of thoughts. At the end of the day, I'd be perfectly happy with only checking units when smat='auto' (should perhaps rename to something like smat='FlyCircuit' actually).

clbarnes · 2022-04-04T14:03:25Z

I put this aside as I rant into a couple of pint bugs, which have since been resolved:

Not treating microns and micrometers as the same thing Fix parsing of units string with same canonalized name hgrecco/pint#1476
Not being able to parse "inf" and "nan" Parse inf and nan in Quantity hgrecco/pint#1435

These are due to be released any day now hgrecco/pint#1450 . However, that release also drops support for python 3.7 👀

schlegelp · 2022-04-04T18:05:20Z

FWIW: I'm game for dropping 3.7

This reverts commit 62b8094.

clbarnes · 2022-04-11T12:48:50Z

I've stripped out the unit-checking for now as I think this is viable without it; it could be added later with the new pint release (which has just come out).

schlegelp · 2022-04-11T19:39:11Z

👍 ready to merge when you are!

clbarnes mentioned this pull request Nov 25, 2021

NBLAST score matrix calculation #28

Closed

clbarnes marked this pull request as ready for review November 30, 2021 15:03

clbarnes force-pushed the smat2 branch 3 times, most recently from 13dc375 to 54b1da0 Compare December 2, 2021 18:28

clbarnes force-pushed the smat2 branch from e381eba to b88d1ef Compare February 15, 2022 13:59

clbarnes force-pushed the smat2 branch 2 times, most recently from 27d5e44 to bdd7012 Compare March 31, 2022 15:52

clbarnes added 4 commits April 11, 2022 13:39

conftest: minor fixture changes

c4b123d

Dotprops.__len__

30abb72

Initial implementation of smat builder

3528104

Fix smat tests

989c648

clbarnes added 16 commits April 11, 2022 13:39

Significant refactor of smat.py

f7d0ae1

smat: calculate digitizers from data

7ad5732

Fix tests, finish docs

902ed74

smat: cache match results

fa87ae3

Switch Blaster.append to return numerical indices

56d780c

Dotprops.dist_dots type annotation fix

f84d02b

small fixes

c2c79ac

smat: write out unclipped bounds

57c557f

incorporate smat.lookup2d into blasting functions

8fbee15

Lookup2d docs

a7f5349

Smat building tutorial

bf00b55

fix lru cache

60c26dd

Remove ScoreFunction references

9246cfb

fix limit_dist test

62a1daa

Add SimpleLookup for non-float cases

e3ccc4d

WIP units

62b8094

clbarnes force-pushed the smat2 branch from bdd7012 to 62b8094 Compare April 11, 2022 12:39

Revert "WIP units"

dfc6b4b

This reverts commit 62b8094.

clbarnes merged commit 7437726 into navis-org:master Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NBLAST score matrix builder v2 #69

NBLAST score matrix builder v2 #69

clbarnes commented Nov 25, 2021

schlegelp commented Nov 26, 2021

clbarnes commented Nov 26, 2021

schlegelp commented Nov 26, 2021 •

edited

Loading

schlegelp commented Nov 26, 2021

clbarnes commented Nov 26, 2021

clbarnes commented Nov 26, 2021

clbarnes commented Nov 30, 2021

schlegelp commented Dec 3, 2021

clbarnes commented Dec 3, 2021 •

edited

Loading

clbarnes commented Dec 7, 2021 •

edited

Loading

schlegelp commented Dec 7, 2021 •

edited

Loading

schlegelp commented Dec 8, 2021 •

edited

Loading

clbarnes commented Dec 8, 2021

clbarnes commented Dec 8, 2021 •

edited

Loading

clbarnes commented Dec 8, 2021 •

edited

Loading

schlegelp commented Dec 8, 2021 •

edited

Loading

clbarnes commented Apr 4, 2022

schlegelp commented Apr 4, 2022

clbarnes commented Apr 11, 2022

schlegelp commented Apr 11, 2022

NBLAST score matrix builder v2 #69

NBLAST score matrix builder v2 #69

Conversation

clbarnes commented Nov 25, 2021

schlegelp commented Nov 26, 2021

clbarnes commented Nov 26, 2021

schlegelp commented Nov 26, 2021 • edited Loading

schlegelp commented Nov 26, 2021

clbarnes commented Nov 26, 2021

clbarnes commented Nov 26, 2021

clbarnes commented Nov 30, 2021

schlegelp commented Dec 3, 2021

clbarnes commented Dec 3, 2021 • edited Loading

clbarnes commented Dec 7, 2021 • edited Loading

schlegelp commented Dec 7, 2021 • edited Loading

schlegelp commented Dec 8, 2021 • edited Loading

clbarnes commented Dec 8, 2021

clbarnes commented Dec 8, 2021 • edited Loading

clbarnes commented Dec 8, 2021 • edited Loading

schlegelp commented Dec 8, 2021 • edited Loading

clbarnes commented Apr 4, 2022

schlegelp commented Apr 4, 2022

clbarnes commented Apr 11, 2022

schlegelp commented Apr 11, 2022

schlegelp commented Nov 26, 2021 •

edited

Loading

clbarnes commented Dec 3, 2021 •

edited

Loading

clbarnes commented Dec 7, 2021 •

edited

Loading

schlegelp commented Dec 7, 2021 •

edited

Loading

schlegelp commented Dec 8, 2021 •

edited

Loading

clbarnes commented Dec 8, 2021 •

edited

Loading

clbarnes commented Dec 8, 2021 •

edited

Loading

schlegelp commented Dec 8, 2021 •

edited

Loading